Skip to content

Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) #16509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 24, 2025

Conversation

zhuqi-lucas
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Add fast path for CursorValues compare performance for StringViewArray.

What changes are included in this PR?

Add fast path for CursorValues compare performance for StringViewArray if the data buffer is empty, which can avoid len compare for following logic.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 23, 2025
@zhuqi-lucas
Copy link
Contributor Author

Test result:

--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ fast_path_view ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1106.56 ms │      106.53 ms │     no change │
│ Q2107.52 ms │      104.28 ms │     no change │
│ Q3666.52 ms │      677.66 ms │     no change │
│ Q4128.45 ms │      127.80 ms │     no change │
│ Q5258.79 ms │      260.39 ms │     no change │
│ Q6271.88 ms │      271.11 ms │     no change │
│ Q7427.24 ms │      415.86 ms │     no change │
│ Q8305.82 ms │      303.95 ms │     no change │
│ Q9316.23 ms │      316.75 ms │     no change │
│ Q10470.93 ms │      467.82 ms │     no change │
│ Q11281.10 ms │      197.90 ms │ +1.42x faster │
└──────────────┴───────────┴────────────────┴───────────────┘

@zhuqi-lucas zhuqi-lucas changed the title Perf: Optimize CursorValues compare performance for StringViewArray (1.6 faster for sort-tpch Q11) Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) Jun 23, 2025
@@ -293,14 +293,19 @@ impl CursorValues for StringViewArray {
self.views().len()
}

#[inline(always)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone else following along, I believe this is the same optimization as applied in

@alamb
Copy link
Contributor

alamb commented Jun 23, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fast_path_view (f22ca3c) to 2bf8441 diff
Benchmarks: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 23, 2025

🤖: Benchmark completed

Details

Comparing HEAD and fast_path_view
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ fast_path_view ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  1988.57 ms │     2142.83 ms │ 1.08x slower │
│ QQuery 1     │   718.95 ms │      707.35 ms │    no change │
│ QQuery 2     │  1381.41 ms │     1385.85 ms │    no change │
│ QQuery 3     │   648.24 ms │      686.89 ms │ 1.06x slower │
│ QQuery 4     │  1372.37 ms │     1370.82 ms │    no change │
│ QQuery 5     │ 15173.88 ms │    15217.15 ms │    no change │
│ QQuery 6     │  2082.96 ms │     2023.77 ms │    no change │
│ QQuery 7     │  2049.36 ms │     2028.40 ms │    no change │
│ QQuery 8     │   805.76 ms │      820.48 ms │    no change │
└──────────────┴─────────────┴────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)             │ 26221.51ms │
│ Total Time (fast_path_view)   │ 26383.55ms │
│ Average Time (HEAD)           │  2913.50ms │
│ Average Time (fast_path_view) │  2931.51ms │
│ Queries Faster                │          0 │
│ Queries Slower                │          2 │
│ Queries with No Change        │          7 │
│ Queries with Failure          │          0 │
└───────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ fast_path_view ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.26 ms │        2.37 ms │     no change │
│ QQuery 1     │    34.52 ms │       33.99 ms │     no change │
│ QQuery 2     │    82.25 ms │       81.73 ms │     no change │
│ QQuery 3     │    91.96 ms │      101.59 ms │  1.10x slower │
│ QQuery 4     │   670.29 ms │      595.46 ms │ +1.13x faster │
│ QQuery 5     │   924.74 ms │      848.52 ms │ +1.09x faster │
│ QQuery 6     │     2.29 ms │        2.23 ms │     no change │
│ QQuery 7     │    39.12 ms │       39.60 ms │     no change │
│ QQuery 8     │   875.61 ms │      869.89 ms │     no change │
│ QQuery 9     │  1203.04 ms │     1148.95 ms │     no change │
│ QQuery 10    │   256.35 ms │      258.61 ms │     no change │
│ QQuery 11    │   290.28 ms │      285.61 ms │     no change │
│ QQuery 12    │   911.90 ms │      874.02 ms │     no change │
│ QQuery 13    │  1264.11 ms │     1274.50 ms │     no change │
│ QQuery 14    │   835.77 ms │      826.75 ms │     no change │
│ QQuery 15    │   788.25 ms │      793.46 ms │     no change │
│ QQuery 16    │  1661.81 ms │     1673.49 ms │     no change │
│ QQuery 17    │  1613.20 ms │     1632.61 ms │     no change │
│ QQuery 18    │  2995.68 ms │     2969.08 ms │     no change │
│ QQuery 19    │    88.59 ms │       84.39 ms │     no change │
│ QQuery 20    │  1171.81 ms │     1139.65 ms │     no change │
│ QQuery 21    │  1320.01 ms │     1283.02 ms │     no change │
│ QQuery 22    │  2171.26 ms │     2095.38 ms │     no change │
│ QQuery 23    │  7442.25 ms │     7410.28 ms │     no change │
│ QQuery 24    │   472.94 ms │      453.60 ms │     no change │
│ QQuery 25    │   402.89 ms │      395.16 ms │     no change │
│ QQuery 26    │   526.97 ms │      514.73 ms │     no change │
│ QQuery 27    │  1579.76 ms │     1525.91 ms │     no change │
│ QQuery 28    │ 13130.51 ms │    11906.74 ms │ +1.10x faster │
│ QQuery 29    │   539.36 ms │      514.08 ms │     no change │
│ QQuery 30    │   800.47 ms │      777.46 ms │     no change │
│ QQuery 31    │   822.87 ms │      818.04 ms │     no change │
│ QQuery 32    │  2525.29 ms │     2559.95 ms │     no change │
│ QQuery 33    │  3219.31 ms │     3228.55 ms │     no change │
│ QQuery 34    │  3323.28 ms │     3244.18 ms │     no change │
│ QQuery 35    │  1282.15 ms │     1269.19 ms │     no change │
│ QQuery 36    │   122.16 ms │      122.21 ms │     no change │
│ QQuery 37    │    53.36 ms │       49.92 ms │ +1.07x faster │
│ QQuery 38    │   119.63 ms │      119.59 ms │     no change │
│ QQuery 39    │   203.97 ms │      196.17 ms │     no change │
│ QQuery 40    │    42.37 ms │       42.41 ms │     no change │
│ QQuery 41    │    38.85 ms │       38.38 ms │     no change │
│ QQuery 42    │    31.93 ms │       33.52 ms │  1.05x slower │
└──────────────┴─────────────┴────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)             │ 55975.43ms │
│ Total Time (fast_path_view)   │ 54134.99ms │
│ Average Time (HEAD)           │  1301.75ms │
│ Average Time (fast_path_view) │  1258.95ms │
│ Queries Faster                │          4 │
│ Queries Slower                │          2 │
│ Queries with No Change        │         37 │
│ Queries with Failure          │          0 │
└───────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ fast_path_view ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 103.20 ms │      103.84 ms │    no change │
│ QQuery 2     │  21.07 ms │       20.58 ms │    no change │
│ QQuery 3     │  32.68 ms │       32.00 ms │    no change │
│ QQuery 4     │  18.60 ms │       18.70 ms │    no change │
│ QQuery 5     │  50.00 ms │       48.53 ms │    no change │
│ QQuery 6     │  11.86 ms │       11.95 ms │    no change │
│ QQuery 7     │  86.84 ms │       88.45 ms │    no change │
│ QQuery 8     │  25.11 ms │       25.03 ms │    no change │
│ QQuery 9     │  54.79 ms │       54.82 ms │    no change │
│ QQuery 10    │  43.38 ms │       42.97 ms │    no change │
│ QQuery 11    │  11.47 ms │       11.33 ms │    no change │
│ QQuery 12    │  34.12 ms │       34.13 ms │    no change │
│ QQuery 13    │  25.94 ms │       26.54 ms │    no change │
│ QQuery 14    │   9.78 ms │        9.75 ms │    no change │
│ QQuery 15    │  18.83 ms │       19.84 ms │ 1.05x slower │
│ QQuery 16    │  18.81 ms │       18.78 ms │    no change │
│ QQuery 17    │  96.22 ms │       95.89 ms │    no change │
│ QQuery 18    │ 191.84 ms │      195.57 ms │    no change │
│ QQuery 19    │  24.92 ms │       25.16 ms │    no change │
│ QQuery 20    │  31.73 ms │       31.82 ms │    no change │
│ QQuery 21    │ 144.91 ms │      148.56 ms │    no change │
│ QQuery 22    │  15.07 ms │       15.09 ms │    no change │
└──────────────┴───────────┴────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary             ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)             │ 1071.19ms │
│ Total Time (fast_path_view)   │ 1079.35ms │
│ Average Time (HEAD)           │   48.69ms │
│ Average Time (fast_path_view) │   49.06ms │
│ Queries Faster                │         0 │
│ Queries Slower                │         1 │
│ Queries with No Change        │        21 │
│ Queries with Failure          │         0 │
└───────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Jun 23, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fast_path_view (f22ca3c) to 2bf8441 diff
Benchmarks: sort_tpch
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 23, 2025

🤖: Benchmark completed

Details

Comparing HEAD and fast_path_view
--------------------
Benchmark sort_tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ fast_path_view ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │  350.37 ms │      347.24 ms │     no change │
│ Q2           │  301.59 ms │      300.50 ms │     no change │
│ Q3           │ 1164.84 ms │     1187.22 ms │     no change │
│ Q4           │  412.28 ms │      410.15 ms │     no change │
│ Q5           │  414.43 ms │      407.09 ms │     no change │
│ Q6           │  445.56 ms │      447.95 ms │     no change │
│ Q7           │  902.09 ms │      888.45 ms │     no change │
│ Q8           │  819.94 ms │      777.72 ms │ +1.05x faster │
│ Q9           │  848.95 ms │      801.57 ms │ +1.06x faster │
│ Q10          │ 1226.96 ms │     1196.23 ms │     no change │
│ Q11          │  750.22 ms │      694.11 ms │ +1.08x faster │
└──────────────┴────────────┴────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary             ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)             │ 7637.22ms │
│ Total Time (fast_path_view)   │ 7458.23ms │
│ Average Time (HEAD)           │  694.29ms │
│ Average Time (fast_path_view) │  678.02ms │
│ Queries Faster                │         3 │
│ Queries Slower                │         0 │
│ Queries with No Change        │         8 │
│ Queries with Failure          │         0 │
└───────────────────────────────┴───────────┘

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas -- very nice

@Dandandan
Copy link
Contributor

🤖: Benchmark completed

Details

had a look at the "regressions", I think should not be impacted by this change (thus noise).

@comphead comphead merged commit 15a8738 into apache:main Jun 24, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-plan Changes to the physical-plan crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perf: Optimize CursorValues compare performance for StringViewArray
4 participants